Multiple dataset training Web Support#503
Conversation
a5c8bef to
7e1f6a3
Compare
0a6567d to
5b17e1a
Compare
Could you explain this part? I didn't expect that anything would need to move. You could just do a simple download of each dataset from Girder, then point to the data in-place without moving anything. Also, I expect you'd like to merge this before #487. I'm fine with that. Just to confirm, this will work with arbitrary dataset ids right? They don't have to be siblings? |
besides my massive spelling mistake there (orangization). Bad choice of words for the explanation. I had the assumption that the testing of the ground_truth to see if it is a directory was in there because of some legacy items where the
Yeah that was the second part of my testing, I was training across different user's public folders, just required that I manually create the array of dataset ids and call the endpoint because there is no UI for it currently. |
jjnesbitt
left a comment
There was a problem hiding this comment.
Just some minor things but it looks good, haven't tested locally yet
For the record, the reason this is done is in case the girder item has multiple files. If it does, it's a folder when downloaded. Otherwise it's just a file. Since we still use the |
Co-authored-by: Jacob Nesbitt <jjnesbitt2@gmail.com>
Co-authored-by: Jacob Nesbitt <jjnesbitt2@gmail.com>
Co-authored-by: Jacob Nesbitt <jjnesbitt2@gmail.com>
| detections = list( | ||
| Item().find({"meta.detection": str(folderId)}).sort([("created", -1)]) | ||
| ) | ||
| detection = detections[0] if detections else None |
There was a problem hiding this comment.
Maybe refactor viame_detection.py _load_detections() helper function?
There was a problem hiding this comment.
Eh, this can be done later.
Fixes #391
NOTE - Need the latest
kitware/viame:gpu-algorithms-latestfor the input_list to work properly.folderIdsand updates the API in the relevant locations.-il input_folder_list.txtand the-it input_groundtruth_list.txtfor specifiying the data.organize_folder_for_trainingbut removed thelabels.txtstuff.--no-queryis added to the groundtruth command so it will use all types that would be in the labels.txtby default and prevent the user from being prompted to accept.I've tested by taking to small datasets with different track types in it and training on them. Then I would run the trained model on another small dataset and ensure that it is using types from both datasets.
Additionally I trained across different folders by using the
/viame/trainendpoint and manually specifying folderIds across different root folders and different public users. It trained successfully and the resulting pipeline incorporated types across the different folders.